Reads were aligned to the hg38 assembly using BWA (0.7.10-r789). The following alignment QC report was produced:
GSE95632_DiffBind_Report.html
MACS2 (2.1.1) function callpeak was used to call peaks. Peaks within blacklisted regions were filtered out.
DiffBind (2.6.6) was used for differential binding site analaysis, based on the the phenotype file provided. Normalized counts are saved in the following text file:
GSE95632_Condition1_vs_Condition0_counts_normalized_by_diffbind.csv
Differential binding site analysis was done for all comparisons provided in the comparisons file, using DESeq2 by default.
Peaks were annotated using corresponding R packages EnsDb.Hsapiens.v79 and TxDb.Hsapiens.UCSC.hg38.knownGene by ChIPseeker (1.14.2).
Create bam read columns
Create bam control columns
Create peak bed columns
Prepare diffbind csv input file
Check if there are replicates in each condition.
## nrep= 2
If there are replicates in each condition, perform following steps:
Obtain consensus peaks in each comparison condition
## 6 Samples, 81269 sites in matrix (104565 total):
## ID Tissue Factor Condition Treatment Replicate Caller Intervals
## 1 SRR5309351 ASM GR cond0 EtOH 1 bed 68779
## 2 SRR5309353 ASM GR cond0 EtOH 2 bed 66637
## 3 SRR5309354 ASM GR cond1 Dex 1 bed 70852
## 4 SRR5309356 ASM GR cond1 Dex 2 bed 79376
## 5 cond0 ASM GR cond0 EtOH 1-2 bed 57139
## 6 cond1 ASM GR cond1 Dex 1-2 bed 63064
Obtain union peaks from consensus peaks of each condition
## 3 Samples, 78325 sites in matrix:
## ID Tissue Factor Condition Treatment Replicate Caller Intervals
## 1 cond0 ASM GR cond0 EtOH 1-2 bed 57139
## 2 cond1 ASM GR cond1 Dex 1-2 bed 63064
## 3 cond0-cond1 ASM GR cond0-cond1 EtOH-Dex 1-2 bed 78325
Use counts from union peaksets for differential binding analysis
If there is no replicate in each condition, use union peaks from two conditions for differential binding analysis. Without replicates, the significance estimation does not make any sense.
Annotate gene regions to peaks
Save differential results and count matrix. If there are replicates in each condition, only save significant binding sites; otherwise save all binding sites.
If there are replicates in each condition, order by p-values; otherwise, other by absolute effect size
Table above shows the selected columns. Here is the description of the full output:
Volcano plot (probes with a q-value <0.05 are present in red)
Binding sites were ranked by adjusted p-values.
Compute PCs and variance explained by the first 10 PCs
| PC | Proportion of Variance (%) | Cumulative Proportion of Variance (%) |
|---|---|---|
| PC1 | 83.24 | 83.24 |
| PC2 | 16.27 | 99.51 |
| PC3 | 0.2627 | 99.77 |
| PC4 | 0.2289 | 100 |
PCA plots are generated using the first two principle components colored by known factors (e.g. Status, Tissue, or Donor)
Binding sites were ranked by pvalue. Counts have been normalized by sequencing depth. Apply log2-transformed values from plotting.
For studies with replicates, if more than 500 significant peaks were identified, only show significant binding peaks; otherwise, show all peaks.
For studies without replicates, show all peaks.
Heatmap of ChIP binding to TSS regions (left).
Average Profile of ChIP peaks binding to TSS region (right).
## Use significant binding sites only.
Prepare diffbind csv input file
Check if there are replicates in each condition.
## nrep= 2
If there are replicates in each condition, perform following steps:
Obtain consensus peaks in each comparison condition
## 6 Samples, 25723 sites in matrix (41629 total):
## ID Tissue Factor Condition Treatment Replicate Caller Intervals
## 1 SRR5309361 ASM RNAP2 cond0 EtOH 1 bed 25843
## 2 SRR5309362 ASM RNAP2 cond0 EtOH 2 bed 34451
## 3 SRR5309363 ASM RNAP2 cond1 Dex 1 bed 28583
## 4 SRR5309364 ASM RNAP2 cond1 Dex 2 bed 28447
## 5 cond0 ASM RNAP2 cond0 EtOH 1-2 bed 20426
## 6 cond1 ASM RNAP2 cond1 Dex 1-2 bed 20547
Obtain union peaks from consensus peaks of each condition
## 3 Samples, 23804 sites in matrix:
## ID Tissue Factor Condition Treatment Replicate Caller Intervals
## 1 cond0 ASM RNAP2 cond0 EtOH 1-2 bed 20426
## 2 cond1 ASM RNAP2 cond1 Dex 1-2 bed 20547
## 3 cond0-cond1 ASM RNAP2 cond0-cond1 EtOH-Dex 1-2 bed 23804
Use counts from union peaksets for differential binding analysis
If there is no replicate in each condition, use union peaks from two conditions for differential binding analysis. Without replicates, the significance estimation does not make any sense.
Annotate gene regions to peaks
Save differential results and count matrix. If there are replicates in each condition, only save significant binding sites; otherwise save all binding sites.
If there are replicates in each condition, order by p-values; otherwise, other by absolute effect size
Table above shows the selected columns. Here is the description of the full output:
Volcano plot (probes with a q-value <0.05 are present in red)
Binding sites were ranked by adjusted p-values.
Compute PCs and variance explained by the first 10 PCs
| PC | Proportion of Variance (%) | Cumulative Proportion of Variance (%) |
|---|---|---|
| PC1 | 93.09 | 93.09 |
| PC2 | 6.71 | 99.8 |
| PC3 | 0.1157 | 99.92 |
| PC4 | 0.0841 | 100 |
PCA plots are generated using the first two principle components colored by known factors (e.g. Status, Tissue, or Donor)
Binding sites were ranked by pvalue. Counts have been normalized by sequencing depth. Apply log2-transformed values from plotting.
For studies with replicates, if more than 500 significant peaks were identified, only show significant binding peaks; otherwise, show all peaks.
For studies without replicates, show all peaks.
Heatmap of ChIP binding to TSS regions (left).
Average Profile of ChIP peaks binding to TSS region (right).
## Use significant binding sites only.
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C
attached base packages: stats4, parallel, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: bindrcpp(v.0.2.2), pander(v.0.6.1), viridis(v.0.5.0), viridisLite(v.0.3.0), RColorBrewer(v.1.1-2), gplots(v.3.0.1), ggplot2(v.3.0.0), devtools(v.1.13.5), DT(v.0.4), tidyr(v.0.8.1), ChIPseeker(v.1.14.2), DiffBind(v.2.6.6), SummarizedExperiment(v.1.8.1), DelayedArray(v.0.4.1), matrixStats(v.0.54.0), TxDb.Hsapiens.UCSC.hg38.knownGene(v.3.4.0), EnsDb.Hsapiens.v79(v.2.99.0), ensembldb(v.2.2.2), AnnotationFilter(v.1.2.0), GenomicFeatures(v.1.30.3), AnnotationDbi(v.1.40.0), Biobase(v.2.38.0), GenomicRanges(v.1.30.3), GenomeInfoDb(v.1.14.0), IRanges(v.2.12.0), S4Vectors(v.0.16.0), BiocGenerics(v.0.24.0) and rmarkdown(v.1.9)
loaded via a namespace (and not attached): backports(v.1.1.2), GOstats(v.2.44.0), Hmisc(v.4.1-1), AnnotationHub(v.2.10.1), fastmatch(v.1.1-0), plyr(v.1.8.4), igraph(v.1.2.2), lazyeval(v.0.2.1), GSEABase(v.1.40.1), splines(v.3.4.3), BatchJobs(v.1.7), crosstalk(v.1.0.0), BiocParallel(v.1.12.0), gridBase(v.0.4-7), amap(v.0.8-16), digest(v.0.6.16), BiocInstaller(v.1.28.0), htmltools(v.0.3.6), GOSemSim(v.2.4.1), GO.db(v.3.5.0), gdata(v.2.18.0), magrittr(v.1.5), checkmate(v.1.8.5), memoise(v.1.1.0), BBmisc(v.1.11), cluster(v.2.0.6), limma(v.3.34.9), Biostrings(v.2.46.0), annotate(v.1.56.2), systemPipeR(v.1.12.0), prettyunits(v.1.0.2), colorspace(v.1.3-2), blob(v.1.1.1), ggrepel(v.0.7.0), dplyr(v.0.7.6), jsonlite(v.1.5), RCurl(v.1.95-4.10), TxDb.Hsapiens.UCSC.hg19.knownGene(v.3.2.2), graph(v.1.56.0), genefilter(v.1.60.0), bindr(v.0.1.1), brew(v.1.0-6), survival(v.2.41-3), sendmailR(v.1.2-1), glue(v.1.3.0), gtable(v.0.2.0), zlibbioc(v.1.24.0), XVector(v.0.18.0), UpSetR(v.1.3.3), Rgraphviz(v.2.22.0), scales(v.1.0.0), DOSE(v.3.4.0), pheatmap(v.1.0.10), DBI(v.0.8), edgeR(v.3.20.9), Rcpp(v.0.12.18), plotrix(v.3.7-3), htmlTable(v.1.11.2), xtable(v.1.8-2), progress(v.1.1.2), foreign(v.0.8-69), bit(v.1.1-12), Formula(v.1.2-2), AnnotationForge(v.1.20.0), htmlwidgets(v.1.2), httr(v.1.3.1), fgsea(v.1.4.1), acepack(v.1.4.1), pkgconfig(v.2.0.2), XML(v.3.98-1.10), nnet(v.7.3-12), locfit(v.1.5-9.1), labeling(v.0.3), tidyselect(v.0.2.4), rlang(v.0.2.2), reshape2(v.1.4.3), later(v.0.7.3), munsell(v.0.5.0), tools(v.3.4.3), RSQLite(v.2.0), evaluate(v.0.10.1), stringr(v.1.3.0), yaml(v.2.1.18), knitr(v.1.20), bit64(v.0.9-7), caTools(v.1.17.1), purrr(v.0.2.5), RBGL(v.1.54.0), mime(v.0.5), DO.db(v.2.9), biomaRt(v.2.34.2), rstudioapi(v.0.7), compiler(v.3.4.3), curl(v.3.1), interactiveDisplayBase(v.1.16.0), geneplotter(v.1.56.0), tibble(v.1.4.2), stringi(v.1.2.3), lattice(v.0.20-35), ProtGenerics(v.1.10.0), Matrix(v.1.2-12), pillar(v.1.2.1), data.table(v.1.11.4), bitops(v.1.0-6), httpuv(v.1.4.3), rtracklayer(v.1.38.3), qvalue(v.2.10.0), R6(v.2.2.2), latticeExtra(v.0.6-28), hwriter(v.1.3.2), RMySQL(v.0.10.15), promises(v.1.0.1), ShortRead(v.1.36.1), KernSmooth(v.2.23-15), gridExtra(v.2.3), boot(v.1.3-20), gtools(v.3.5.0), assertthat(v.0.2.0), DESeq2(v.1.18.1), Category(v.2.44.0), rprojroot(v.1.3-2), rjson(v.0.2.20), withr(v.2.1.2), GenomicAlignments(v.1.14.2), Rsamtools(v.1.30.0), GenomeInfoDbData(v.1.0.0), rpart(v.4.1-11), grid(v.3.4.3), rvcheck(v.0.1.0), shiny(v.1.1.0) and base64enc(v.0.1-3)